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Abstract 


A method  is  proposed  for  the  estimation  of  a general 
class  of  scalar  linear  time  series  models.  The  model  takes  the 
form  of  a stochastic  difference  equation  for  the  dependent  variable 
with  exogenous  variable  inputs,  and  the  disturbances  are  autocorrelated 
through  am  autoregressive  moving  average  process.  In  the  present 
paper  an  asymptotically  efficient  yet  computationally  simple 
estimation  procedure  (in  the  time  domain)  is  derived  for  this  model. 

The  resulting  estimator  is  shown  to  be  asymptotically  equivalent  to 
the  maximum  likelihood  estimator  and  to  possess  a limiting  multi- 
variate normal  distribution. 
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1.  INTRODUCTION 


We  consider  the  estimation  of  the  parameters  in  the  model 
r k 


U)  yt  - + V 


(2)  »t  - - et  + 1^1Yi6t-l> 


• • 3 3 0 , 1 , • • • ) 


With  the  use  of  the  lag  operator  s£  such  that  «c  y^  = y^.  the 

model  can  also  be  written  as 


(3)  AU)yt  = ^ ttxti  + ut. 


where  A(^)  = 1 - a^t 


. - a £ , and 
r 


(4)  0U)u.  = rU)et, 


where  0(s£)  = 1 - - 


-0p^P,  IU)  = 1 + Yl^+  ...  +yqJ£c 


Assumptions  used  in  the  estimation  of  this  model  are 

(i)  the  e.  are  independent  and  identically  distributed  with  mean  0 

" 2 
and  common  variance  a , 

(ii)  all  roots  of  A(z)=0,  0(z)=O,  and  r(z)=0  are  greater 
than  1 in  absolute  value  and  there  are  no  roots  common 
to  the  three  equations, 

(iii)  the  exogenous  variables  xpi  are  nonstochastic  sequences 
which  satisfy 

T 

llm  T t-1  Vm,lxt+n,J  * plJ<n-n)  * »jl<n-n> 
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exists  for  m,n  -1,0,1,  ••• , and  with 

ou(0)  > 0. 

These  assumptions  imply  the  following: 

-1  “ i _i  00  i 

(iv)  the  infinite  series  A(z)  = 2 X.z  , 0(z)  = Z '!' . z , and 

i=0  1 i=0 

r(z)“  = Z 6.z  all  converge  for  |z|  <1  + A,  A > 0, 

i=0  1 


(v)  the  endogenous  variable  can  be  expressed  as  the  "steady 

state  solution"  to  the  difference  equation  (3)  as 

(5)  ,t- ji#1*(*)-Si  + AU)'\  - £ J0Slhxt-J,l  + J0XJUt-J' 


while  the  disturbance  u^  can  be  expressed  as  the  stationary 
solution  to  the  autoregressive  moving  average  equation  (4)  as 

(6)  Ut  - 0U)-bU)ct  - jo  jSj  VjS-i-r  vo-1> 

(vi)  there  exists  a spectral  distribution  matrix  Fx( X ) = [ Fmn( X ) }# 

mi  "L  h X 

such  that  omn(h)  = j e (m,n=l, . . . ,k;  h=0, !>•••)  • 

v -7 r 

r7T  ihX 

This  can  be  expressed  more  compactly  as  P(h)  = t e dF  (X), 

— TT  X 

where  P(h)  denotes  the  matrix  whose  (m,n)th  element  is  pmn(!l) 
(See  Hannan  [14],  Chapter  2,  or  Anderson  [3],  Chapter  7>  for 
details  concerning  this  assertion. ) 
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Several  techniques  have  previously  been  introduced  for 
estimating  special  cases  of  the  above  model  (3)-(4).  For  example, 
when  p1=...=g^  = o and  0(z)=l  the  model  reduces  to 

A(=Oyt  = rU)et, 

the  classical  time  series  ARMA  model.  Early  methods  of  estimation 
of  this  model  include  those  proposed  by  Durbin  [9],  [10],  and 
Walker  [30],  [31]-  Durbin's  method  relies  on  approximating  the  moving 

q 

average  process  v^  = e^.  + Z Yiet-i  by  a high-order  autoregression, 
while  Walker's  procedure  maximizes,  with  respect  to  the  and  y 

the  approximate  likelihood  function  of  the  first  n sample  serial 
correlations  of  y^.  More  recently,  Hannan  [13],  Clevenson  [7],  and 
Parzen  [24]  have  constructed  estimates  based  on  Fourier  transformation 
of  the  data  and  spectral  methods.  Akaike  [1]  has  shown  that  Hannan' s 
procedure  is  approximately  a Newton-Raphson  method  in  the  frequency 
domain,  while  the  methods  of  Clevenson  and  Parzen  are  approximations 
to  the  method  of  scoring  in  the  frequency  domain  (using  an  alternative 
parametrization  of  the  moving  average  process.).  The  method  of  Box 
and  Jenkins  [5]  is  to  maximize  the  likelihood  function  by  computing 
its  value  at  a grid  of  trial  values  of  the  parameters.  They  also 
consider  nonlinear  least  squares  estimation  of  the  parameters  by 
numerical  methods.  Anderson  [4]  estimates  the  parameters  using  the 
method  of  scoring  and  Newton-Raphson  methods  (in  the  time  domain) 
under  the  assumption  yQ  = y_^  = . . . = y1_r  = 0 and  €q  = e-l  = ’ * * = €l-q  = °* 
The  method  that  will  be  presented  later  in  this  paper  for  estimating 
the  more  general  linear  model  ( 3 ) — ( 4 ) is  related  to  Anderson's  method 
for  this  special  case. 
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For  the  model  (3) -(4)  containing  exogenous  variables, 
methods  of  estimation  have  been  presented  only  in  special  cases. 

For  the  difference  equation  model  (5)  with  pure  autoregressive  errors 
in  (4),  Wallis  [53]  has  suggested  an  Aitken  generalized  least 
squares  estimator  of  the  parameters  in  (3)  using  an  estimate  of  the 
covariance  matrix  of  the  error  term  u^.  However,  as  noted  by 
Amemiya  and  Fuller  [2],  Maddala  [21]  and  others,  this  method  is  not 
asymptotically  efficient  due  to  the  presence  of  lagged  values  of 
the  endogenous  variable  yt  as  regressors.  Recently  Hatanaka  [17] 
has  presented  an  efficient  two-step  estimation  procedure  for  this  model 
which  is  identical  to  the  procedure  that  will  be  proposed  in  the 
present  paper  for  this  special  case.  It  will  be  shown  that  Hatanaka' s 
procedure  is  approximately  a Newton-Raphson  method  in  the  time  domain. 

For  the  special  case  of  a moving  average  errors  model  in  (4)  with 
equality  of  moving  average  and  difference  equation  coefficients  (i.e., 
the  distributed  lag  model  with  Y ^ = -a^),  Dhrymes  [8]  has  presented 
a method  of  estimation  based  on  Newton-Raphson  techniques  which  is 
similar  to  the  method  to  be  proposed.  Estimation  of  the  model 
containing  a general  moving  average  errors  process  in  (4)  has  been 
considered  by  Hannan  and  Nicholls  [16]  using  Fourier  transformed 
data.  Phillips  [25],  Trivedi  [29],  and  Hendry  and  Trivedi  [18] 
have  estimated  this  same  model  by  iterative  solution  of  the  maximum 
likelihood  equations,  their  methods  being  somewhat  similar  in  this 
special  case  to  the  method  to  be  presented  in  this  paper.  An  excellent 
summary  of  methods  of  estimation  of  the  difference  equation  model  (5)  with 
moving  average  disturbances  in  (4)  is  provided  by  Nicholls,  Pagan, 
and  Terrell  [25].  Box  and  Jenkins  [5]  and  Pierce  [27]  have  considered 
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the  estimation  by  iterative  nonlinear  least  squares  methods  of 
a general  transfer  function  model  which  is  similar  but  not  identical 
to  the  model  to  be  considered  here. 

The  purpose  of  the  present  paper  is  to  obtain  an  estimation 
procedure  for  the  linear  time  series  model  (J)-(^)  which  is 
asymptotically  efficient  yet  computationally  simple.  The  method 
to  be  proposed  uses  the  maximum  likelihood  approach  and  is  based 
on  Newton-Raphson  techniques  applied  to  the  likelihood  equations. 

The  resulting  "Newton-Raphson"  estimator  is  shown  to  be  asymptotically 
equivalent  to  the  maximum  likelihood  estimator  and  to  possess  a 
limiting  multivariate  normal  distribution. 

2.  THE  METHOD  OF  ESTIMATION 

For  the  estimation  of  the  parameters  in  the  model  (3)-(^)> 
let  us  suppose  that  the  observations 

yt,  yt-l'* • *’yt-r'  xtl* ' ' ’ ’ xtk*  are  available  for  t=l, ...,T. 

To  motivate  the  estimation  procedure  we  will  assume  the  are 

normally  distributed  and  use  the  maximum  likelihood  approach.  To 
simplify  the  form  of  the  likelihood  function  certain  assumptions 
will  be  made  concerning  the  initial  observations  and  disturbances. 

This  is  necessary  due  to  the  dynamic  time  series  structure  of  the 
model,  and  in  particular  to  the  complicated  form  of  the  inverse  of 
the  covariance  matrix  of  consecutive  observations  from  a moving 
average  process.  First,  we  consider  the  Initial  observations 
yl-r’ ' ' ' 'y0'  ’ ' ‘ ,yp  as  f’ixed»  and  estimate  from  the  likelihood 
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function  conditional  on  these  values.  Second,  we  assume  that 
the  initial  disturbances  €p+i_q-»  • • • •>  €p  are  e(luaY  to  their  un- 
conditional expectations,  which  are  0.  Then  introducing  the 
(T-p)  x (T-p)  lag  matrix  L which  has  l's  on  the  diagonal 
directly  below  the  main  diagonal  and  0's  elsewhere,  we  define 
the  (T-p)  x (T-p)  matrix 


* i 

G = I + Z y,  L . 
i=l  1 

1 T-p-1  , 

(Note  that  by  condition  ( iv)  of  Section  1,  G“  = Z 6 L . ) 

i=0 

Thus  defining  the  vectors 


Y — ( Yp+l * ’ * ’ * Y'p)  ’ ^i  — ( Xp+i f i-»  • * • » ^Ti  ^ * ( t— t,  • * • > ^)  > 


U — ( ^p-fl  * * * * * Urp)  1 e — ( * " * * * £t)  * 


p+lJ 


we  can  express  the  entire  (modified)  model  in  vector  form  as 
r k 

(7)  Y - Z a.^Y  = Z p.X,  + U, 

i=l  1 i=l  1 1 


(8) 


U - Z 0^  U = e + Z Y,L  e, 
i=l  1 i=l  1 


where  it  should  be  noted  that  «dY  = (Yp*  • • • = (up  uT-l)y 

while  Lc  = (0,  ep+1, • • • t £t_i) ' • The  equations  (7 ) - ( 8)  can  be 
written  more  compactly  as 


AU)Y  = Z p,X.  + U, 
i=l  1 1 


ft 


(9) 
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(10)  0U)U  = G«. 

On  the  assumption  of  normality  of  the  the  (modified) 

likelihood  function  of  the  observations  yp+1, . . • given 

yl-r’ '••,yp’  iS 

(0(a?)U),G'"1G_1(0(s£)U)  , 
2o 


F = 


(27r)?(T-p)(a2)5(T-p) 


exp 


where  U is  expressible  in  terms  of  the  observable  quantities  Y 
and  through  equation  (9)-  Then  using  the  fact  that 

-L-  G'1  = -G-1(— G)  G_1  and  -i-  G = Lm, 
dym  9vm 


we  obtain  the  partial  derivatives 


dlOgF 

i 

(^m0(a?)Y)/G,“1G”1(0(^)U)> 

( m— 1 j • • • > r) , 

3 01 

m 

a2 

31ogF 

1 

(0(£)Xrn)/G/"1G"1(0(^)U)  , 

( m=l * . . • i k) t 

Km 

a2 

dlogF 

1 

UmU)/G'"1G"1(0(^)U)  , 

( m=l »••■>?)» 

30 

m 

a2 

8. 


S-1°-SF  = -K  (0U)U)/G,"1LrTfG/'1G“1(0U)U),  (m=l,...,q). 

5y  a 

m 

Defining  the  vector 

9 = ( t • • • » ot  0^  j • • • > 0p  > ■ 1 ■ * ^q) 

and  the  matrix 

W=  U0U)Y,  r0(S)Y,  0U)X1,...,0U)XK,^l],...,^p  U,Le,.. 


we  can  express  these  derivatives  in  vector  form  as 


(11) 


aiogF 

ae 


W/G,_1G"1(0(£)U) . 


Setting  these  derivatives  equal  to  0 leads  to  maximum  likelihood 
equations  which  are  nonlinear  in  the  parameters  9.  Thus  these 
equations  can  only  be  solved  by  numerical  procedures  such  as  the 
Newton-Raphson  method.  The  Newton-Raphson  method  for  solving 
equation  (11)  is  based  on  the  Taylor's  expansion 


aiogF  _ aiogF 
ae  ae 


+ 


a2iogF 

aeae' 


0 * 


(9  - 9Q), 


where  H 0*  - 8qII  < ||9  — 9q!^  and  ||  • ||  denotes  the  usual  Euclidean 
norm.  The  Newton-Raphson  equations  for  an  approximate  maximum 
likelihood  estimator  9 are 


a2logF 

aeae'  . 
9o 


(8-e0)  = 


ae 


* 


(12) 


9- 

where  0n  is  an  initial  estimate  of  0.  Thus  the  Hessian  of 
0 2 

log  F,  ^ , plays  a dominant  role  in  the  Newton-Raphson 

90*0'  2 

method.  It  can  be  shown  that  an  approximation  to  - ^ is 

9990  ' 

A 11 

given  by  — ^ w'g'-  G~  W.  This  approximation  involves  the  omitting 

cr 

of  certain  terms  in  the  Hessian  of  log  F which,  when  divided  by  T, 
converge  to  0 in  probability  as  T->oo.  We  will  express  this 
"asymptotic"  approximation  as 


(13) 


_ 9_logF  ^ W'G'-1G-1W. 

9999' 


To  obtain  the  Newton-Raphson  estimator  of  9,  we  assume 
that  we  have  an  initial  estimate. 


9q  — (q^,  . . . jCtj,,  3 • • • > 3 v ^3  • • • , Y q ) , which  is 

a consistent  estimate  of  9 to  the  order  T~*  in  probability,  i.e., 

9^-9  = 0 (T-®).  This  estimate  may  be  obtained  as  follows: 
Op' 

(a)  Obtain  consistent  estimates  a1#...,a  , from 

equation  (7)  using  the  method  of  instrumental  variables  estimation. 
(See  Liviatan  [20]  and  Dhryraes  [8],  Chapter  5).  Then  compute 

the  residuals 

r — k __ 

ut  = yt  - i^aiyt_i  - i^0ixtl,  ( t=l , . . . , T) . 

(b)  Using  the  calculated  residuals  Up,  we  obtain  consistent 
estimates  of  the  autocovariances  o(s)  = E(UpUp_s),  s=0,l,..., 
of  the  ut 


as 


10. 


1 i _ 

:(s)  = - 2 (u.  -u)(u.  -u)  = c(-s),  ( s=0,l,  . . . ,p+q), 

T t=  s+1  z Z~S 

1 T - 

where  u = — 2 u.  . We  then  estimate  the  parameters  jz>. 

T t=l  r 

consistently  by  solving  the  Yule-Walker  type  equations 


c(  s)  - 2 c(  s-i)  = 0, 


( s=q+l, . . . ,q+p) . 


(c)  Having  obtained  the  estimates  5?,,  we  form 


P P 

c ( s ) =2  2 ?,  c(s+J-/)  = c (-s),  s=0, 1, . . . , q,  where 

j*0  4=0  J 

?Q  = -1,  and 


?VU)  = — 2 cv(s)e' 


■isX  1 


2tt  s=-q 


[c  (0)  +22  c (s)cossX],  -tt<X<7t. 


These  are  consistent  estimates  of  the  autocovariances  and 
spectral  density,  respectively,  of  the  moving  average  process 

q 

v.  = e.  + 2 y,  € . , . If  f (X)  > 0 for  -7r<X<7r,  then  we 

T>  L > ^ ^ 1.  U “■  1 V 

may  factorize  this  in  the  form 

~2  q 2 

fv(X)  = — I 2 YheihX|  , 

v 2 ir  h=0  n 

where  the  are  real  with  = 1.  The  consistent  estimates 

are  obtained  as  the  solution  to  the  equations 


Cy(  s)  — c 2 Yh  * (h-0,l>  • • • i?)  i 


which  may  be  solved  using  an  algorithm  of  Wilson  [25]- 
Consistent  estimates  of  the  Yh  may  also  be  obtained  by  a metnod 
which  does  not  require  the  factorizing  of  the  spectral  density 
f (X)  (see  Hannan  [12]). 


' 
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Now  we  let 

A(£)  = 1 - - ...  - a^r,  $U)  = 1 - ^ - • • • ' ^P» 

q __  i 

G = I + Z v L , 

1=1  1 

k 1 

U = A(s£)Y  - Z r.X.,  ? = G"X^(^)U  , 

1=1  X 

W = U$U)Y,  . . . ,;dr^(^)Y,^(^)X1,  . . . ,^U)Xk,=dJ,  . . . ,£PU,  Le,  ...,Lqe). 

Then  using  (11)  and  (13)  in  equation  (12),  we  obtain  the  following 
Newton-Raphson  equations  for  9, 

-1g-1W( q - eQ)  = w'S'-V^GOU)  • 

Since 

r*  Hi  p 

W9n  + ^(£)TT=  Z a.J£i^(^)Y+  Z OU)X.  + Z ?i£1U 

0 1=1  1 i=l  1 1 i=l  1 

q , 

+ Z Y.L  e + ?U)U 

i=l  1 

= $U)(Y-lJ)  + U-e  + ^(^)U 

= $(=d)Y  + U-  e, 

the  Newton-Raphson  equations  can  be  written  in  the  form 
(14)  W't?'  '1G"1W^  = W/G,"1?“1(^(!€)Y  + !J  - e). 

The  Newton-Raphson  solution  9 to  equation  (14)  can  be  interpreted 
as  the  generalized  least  squares  solution  to  the  identity 
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(15)  2(*)Y  + U-€ 

r , k p . q , 

= 2 a.*i  $U)  Y + 2 g,^U)X.  + 2 tf.d  U + 2 v.L  e + 3e 

1=1  1 1=1  1 1 1=1  1 1=1  1 


+ 


2 (?.  - - adtU) 

1=1  1 1 


+ 


Q A A 

2 (Yl  - Y,)(L  e - L e ) , 
1=1  1 1 


where  the  last  two  terms  on  the  right  hand  side  of  the  equation  are 

^ q ~ 1 

to  be  neglected  and  the  error  term  Ge=e+  2y.Le  is  treated 

1=1  1 

P-W*w  / 

as  having  covariance  matrix  o GG' . 

We  conclude  this  section  by  discussing  briefly  the  computations 
that  are  needed  to  complete  the  estimation  procedure  and  obtain  the 
estimator  9 as  the  solution  to  equation  (14).  Once  the  initial 
estimate  0Q  and  the  residuals  have  been  obtained,  we  compute 


$(^)yt  = yt  - 2 ?1yt_1  for  t=p+i-r, . . . ,t. 


1=1 


and  similarly  compute  £f(^)xti,  i=l,...,k,  and  gf(£)ut  for 

A 

t=p+l,...,T.  Next  we  obtain  the  vector  e”  = Gr  $(:d)U  recursively 
from  the  equation  Ge  = $(sd)U  as 


?t  = eu) ut  - z Vt-i 


for  t=p+l, . . . , T, 


where  £p+1_q  = •••  = €p  = 0.  Then  forming  the  matrix  ft  as 

defined  above,  we  compute  the  columns  of  the  matrix  of  "independent" 
variables  W = G_1W  recursively  from  GW  = W similar  to  the 
computation  of  e,  and  we  also  compute  the  vector  of  the  "dependent" 
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variable  Y = G”  (jzf(j£)Y  + U-e)  recursively  from 

GY  = j?(^)Y  + U - e\  Finally,  the  estimator  § is  simply  the 

least  squares  solution  to  the  regression  of  Y on  W,  i.e., 

9 = (w'w)_iw'?  . 


THE  ASYMPTOTIC  DISTRIBUTION  CF  THE  ESTIMATOR 


Since  the  exact  finite  sample  distribution  of  the  proposed 
estimator  9 is  too  complicated  to  be  obtained  in  closed  form, 
we  will  consider  only  asymptotic  properties  of  the  estimator  as 
T-*°°.  To  describe  the  asymptotic  distribution  of  the  estimator  9 
we  introduce  the  matrices  M,  N,  H,  H,  Z,  K,  E,  T,  and  Q whose 
(m,n)th  elements  are  defined  respectively  by 


l-n 


x.n 


T oc 


/ 


.•s. 


J-l  ° 


2 { e 


ix  1 I 2 


i • m-r. ) \ 


|A(e 


ix)l2lr(eu)|2 


whe re  3 = ( = ^ , . . • , 5 , ) ' , 


v = lim  — 
mn 


T*  — 1 ^ v 

•/ 


- y a 

- X K J 


!^(eU)  i2 


j = l J -7T  A 


A(e1X)|r(eU) 


In v K ' ' 


15- 


t « lim  -E[(Lm€)<G'~1G"1(Ln€)] 

m * * i 'T* 

T co 

J2.  f»7r  al(ra-n)X 

- - j > 


x 

m-r. 


lim  ^ E[  (£  nU)  ' G'  ~ G-1  ( Lne  ) ] 
T ^ oo  * 


:ve 


D. 


d>.  . 


Then  we  can  state  the  following 

Theorem.  For  the  model  (1)  and  (2)  under  assumptions  (i)-(iii) 
given  in  Section  1,  let  6 denote  the  estimator  of 
p = (a1,...,ar,  fzh,  . . . ,vx,  . . . ,v  ) ' as  obtained  from 

equation  (14).  Then  the  distribution  of  VT-  (5-9)  converges  to 
a multivariate  normal  distribution  as  T ■*  «>  with  mean  vector  0 
and  covariance  matrix  equal  to 


Before  discussing  the  proof  of  the  theorem,  we  make  the 
following  comments: 

(a)  The  matrix  V defined  by  (16)  can  be  seen  to  equal 


E(£logF)  _ 

seae' 


16. 


lim 
T 00 


- E(W'G' _1G_1W)  = lim 


o_ 

T 


Thus  the  asymptotic  distribution  of  9 is  identical  to  the  asymptotic 
distribution  of  the  maximum  likelihood  estimator  based  on  the 
assumption  of  normality  of  the  e^,  so  that  9 is  asymptotically 
efficient  relative  to  the  maximum  likelihood  estimator  when  the 
disturbances  are  normally  distributed. 

(b)  The  proof  to  be  given  will  show  that  9 converges  to  9 in 
probability  as  T -*•  ». 

2 

(c)  An  asymptotically  efficient  estimate  of  o is  given  by 

a2=  — (A(£)^(sd)Y  - 2 §.^(£)X1),G,"1&‘1(A(^)^(£)Y-  2 
T-p  i=l  11  i=l 

where  the  " denotes  that  these  quantities  are  to  be  evaluated 

* A 2 

at  9.  The  estimator  a will  be  asymptotically  uncorrelated  with 
o.  Similarly,  the  covariance  matrix  of  9 can  be  estimated  by 

(d)  The  asymptotic  properties  of  the  estimator  A do  not  require 

an  iterative  procedure,  only  an  initial  consistent  estimate.  However, 
in  practice  one  may  want  to  iterate  the  least  squares  solution  to  (14). 
In  particular  we  suggest  a second  iteration  so  that  an  asymptotically 
efficient  estimate  of  the  covariance  matrix  of  the  estimator 
is  obtained  simply  as  a by-product  of  the  least  squares  estimation. 

(e)  As  we  have  mentioned  in  Section  1,  the  estimation  of  various 
special  cases  of  the  model  (l)-(2)  has  been  considered  by  others. 


17. 


We  now  compare  the  estimator  proposed  in  this  paper  with  some  of 
the  previously  proposed  estimators  for  these  special  cases.  For 
the  time  series  ARMA  model 

AU)yt  = 7(^)et, 
equation  (15)  reduces  to 

q ~ i~  r i q „ q , i__  i 

Y + Z v,L  € = 2 a.£xY  + Z v.L  e + Ge  + 2 ( y.  - y . ) (L  e - L e ) . 

i=l  1 i=l  1 i-1  1 i-1  1 1 

The  generalized  least  squares  estimation  procedure  which  results 
from  this  equation  is  similar  to  Anderson's  (see  [4])  Newton-Raphson 
method  except  for  the  treatment  of  initial  values  of  the  y^,  i.e., 
Anderson  uses  L^Y  in  place  of  s^Y . In  the  distributed  lag  model 

k 

A(s£)yt  = + A(^)et, 

equation  (15)  takes  the  form 

r r ^ 

Y-  Z a LXe  = Z a,  U^Y  - L1?)  + Z 8.X.  + Ae  - Z (a,  - a . ) ( L1?  - LXe ) , 

i-1  1 i-1  1 i=l  1 1 i-1  1 1 

r ~ i 

where  A = I - Z a.  L . The  estimation  of  this  equation  by 
1=1  1 

generalized  least  squares  is  related  to  a method  suggested  by 
Dhrymes  [8],  Chapter  9.  For  the  general  moving  average  errors  model 

k 

A(«£)yt  = i^1eixti  + r^)€t’ 

the  identity  (15)  becomes 


T 


■mi 
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q . k q ~ 

Y + 2 y,L  e = 2 a.£xY  + 2 0.X.  + 2 y1L  €-  + Ge 

1=1  1 1=1  1=1  1=1 


Q.  -I  ■! 

+ 2 (yi  ■ Y4)(L  € - L e). 

1=1  1 1 

The  estimation  of  this  identity  by  generalized  least  squares  is 
similar  to  the  method  of  Phillips  [25]  except  for  the  treatment 
of  the  values  of  the  initial  disturbances  e^,  i.e.,  Phillips 
considers  these  values  as  parameters  to  be  estimated.  Finally,  for 
the  pure  autoregressive  errors  model 

k 

A(sd)yt  = 0ixti  + v 0(£K  = et> 

Hatanaka  [IT]  has  suggested  a method  identical  to  the  least  squares 

estimation  of  the  identity 

p . r , k Pi 

?U)Y  + 2 H.JL U = Z a ,£X&U)Y  + Z 3,0(rf)X,  + 2 TJ  + e 
i=l  1 i=l  1 i=l  1 1 1=1  1 

+ 2 (?.  - fS. ) - »£iU) , 

i=l  1 1 

which  is  simply  (15)  in  this  special  case. 

Proof  of  Theorem:  We  shall  not  go  into  great  detail  here  but  merely 

give  an  outline  of  the  proof.  First,  we  can  ignore  the  effect  of 
the  modification  of  the  initial  disturbances  e^,  and  hence  the  use 
of  the  lag  matrix  G in  place  of  the  lag  operator  r(«f),  since 
the  modification  has  a negligible  effect  as  T -*■  « and  the  asymptotic 
properties  of  the  estimator  are  not  affected  by  this  modification 


19. 


(see  Anderson  [4]).  Then  using  the  (modified)  identity  (15) 
and  equation  (14),  we  have 

(17)  9 = (w'g'  "1g"1w)"1  w'g'^g-1  + U-?) 

= (w'g'  _1G_1i?) _1  vT^'V1  (we  + (Te 

+ Z (?,  - ^ )(a£IU-a?iU)  + Z (7,  - v1)(Li7- L^)) 
i=l  11  i=l  1 1 

It  follows  that 

(18)  VT“(«-0)  = (-  W'S'-V1^)"1  . — W'S'^e 

T VT" 

+ (- vJ'G'“1g_1w)“1  • ( z Vt^  (?,  - 4s )-  tf'S'"1S'1(£Lu  -z^u) 

T i=l  1 1 T 

+ s vr  (?.-Yji  rr-V^L1?-^)). 

i=l  1 T 

Now  each  of  the  terms  VT-  — W'5'  1S'-1(a?iiJ  - sd^U)  on  the 

right  hand  side  of  (18)  has  a probability  limit  equal  to  0 as 

T-*».  This  is  true  since  VT_(?i  - is  bounded  in  probability 

as  T-^oo  by  the  consistency  of  £T,  , while  — ft' (5'  ^G  ^(sd^U  - £^U) 

i T 

converges  to  0 in  probability  as  T + oo,  again  by  consistency  of 
the  initial  estimates.  The  same  argument  also  applies  to  the  terms 
involving  on  the  right  side  of  (18).  Hence  we  can  conclude  that 

the  entire  second  term  on  the  right  hand  side  of  (18)  converges  to  0 


20. 


in  probability  as  T-*<»,  since  we  will  show  that  the  matrix 

( — has  a finite  probability  limit  as  T+oo.  It 

T 

also  follows  by  the  consistency  of  the  initial  estimate  90  that 

if  the  matrix  — W,G/”*G~^W  possesses  a finite  probability  limit 
T 

as  T-»oo,  then  the  matrix  - W,G,-1G-1W|  = - will 

T 90  T 

have  the  same  probability  limit.  And  finally,  the  limiting 
distribution  of  the  vector  — S'~€  will  be  the  same  as  that  of 

VT 

— w'g'  e,  since  the  difference  between  the  two  vectors, 

vT 

1 1 _ d 

— (G~“W-G~  W)'e,  converges  to  0 in  probability  as  T-+00.  This 

VT 

follows  from  the  fact  that  all  the  elements  in  the  vector  of  difference. 

above  are  essentially  in  the  form  of  products  of  two  terms,  one  of 

which  involves  VT  times  elements  of  the  difference  (9^-0)  and 

the  other  of  which  involves  elements  of  the  vector  — W'G'  e.  Then 

T 

since  plim  — w'G'  e = 0 and  fin  is  a consistent  estimate  of  6 
T » T u 

to  the  order  T~^  in  probability,  arguments  similar  to  those 
used  for  the  other  terms  in  equation  (18)  can  also  be  applied  to  the 
vector  of  differences  — (G  Vf  - G-  W)'e.  Thus  we  see  from  the  above 

VT 

arguments  that  the  limiting  distribution  of  VT  (S-0)  will  be 

identical  to  that  of  (-  W' G' _1G_1W) _1  — W'G,-1€.  Hence  it  follows 

T VT 

that  the  results  of  the  Theorem  will  be  established  once  we  show  that 

(I)  plim  iw/G,'1G"1W=  lim  - E( W' G' "1G"1W)  = V, 

T ■*  00  T T^-oo  T 


■ ■ 


mm 
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where  the  matrix  V is  defined  by  (16),  and 
1 -1 

(II)  — W'G'  e has  a limiting  normal  distribution  with  mean 

VT 

2 

vector  0 and  covariance  matrix  equal  to  a V. 


Proof  of  (I).  We  consider  the  probability  limit  of  a typical 

1 / -1  —1 

element  of  the  matrix  — W'G'-  G~  W.  For  example,  using  equations 

T 

(9),  (10),  and  (5),  the  fact  that  ^(xti€s^  = 0 for  all 
t, s=. -1,0,1, .. . and  i=l,...,k,  we  have 

(19)  lim  SL  E(-  *--^F  ) 

T ->  °°  T 

n m 


= lim  - E[(/m0(£)Y)  ,G,_1G_1(Ln€)  ] 
T -*■  oo  T 


= lim  - E[(  Z 0 .^mA(*)-10U)X,  + 4mAU) -10U) U)  ' 
T -*•  oo  T i=l  1 


•G/-1G"1(Lne)] 


= lim  - E[(AU)’1s)' (G_1Lne)] 
T -*•  * T 


T t-p-n-1 


= lim  - E[  Z Z Z 6 X €*  €* 

T - oo  T t=p+n+l  u=0  v=0  u v t-n-u  t‘m-v 


2 T 

= lim  — Z 

T -*■  oo  T t=p+n+l 


t-p-n-1 

Z 

u=0 


t-p-n-1 

Z 

u=0 


6 X , 

u u+n-m' 


6u+m-nXu’ 


for  m < n 


for  m > n 
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T-p-n-1 


Z (T-n-p-u)  6 X 


lim  — < u=0 

T -*•  oo  T 

?-p-n-l 


u u+n-nr 


Z (T-n-p-u)  6 „X„, 


= o 


u=0 

00 

z 

u=0 

6u^u+n-m’ 

00 

o 
n Ji 
3 

^u+m-nXu’ 

rTT 

I 

gi ( m-n) X 

1 

-t r 

A(eU)r(e“iX 

u+m-n  uJ 


dX 


for  ra  < n 


for  m > n 


= H 


m-n’ 


(m=l, . . . , r;  n=l, . . . ,q) • 


In  obtaining  the  above  limit,  we  have  used  the  fact  that  the 
Cesaro  summation  of  a convergent  series  converges  to  that  sum  (see 
Anderson  [5],  Lemma  8.5.1).  Also,  to  obtain  Hm_n  as  the  probability 
limit  of  the  quantity  in  (19) > first  we  have 


I n ' t r n 1 ^ 


(20)  E[— (*£mA(s£)  _10(^)  X^)  7 g'  ~ G"  (L  €)]' 
T 


= -o  E[  Z 


t =p+n+l  u=0 


t-p-n-1  t-p-1  _i  ,2 

Z 2 6U6VA(^)  xt-m-v,  iet-n-u-* 

v=0 


T 

Z 


T t, s=p+n+l  u=0 


t-p-n-1  s-p-n-1  t-p-1  s-p-1 

z 2 z Z 6u6  6v6£ 

j=0  v=0  1=0 


23- 


• A(Z) -10(^) xt.m.v> tA(*) '10(^)xs.m.i; 1E(€t.n.ue 3.n.j) 


2 T t-p-n-1  T-t+u  t-p-1  t-u+J-p-1 

° 2 2 2 2 2 VjV-i 

t=p+n+l  u=0  j=0  v=0  ^=0 


-1, 


A(s£)  $(«£)  xt-m-v,  &(£)  xt-u+J-m-X  , i 


Hence  (20)  is  less  than 
2 


max  |AU)-10(rf)xt  ,1  1 ( S I S | ) 

T2  , ^ m ’ t=p+n+l  u=0 

1 p+l<t<T  F 


2 1 
5-  max  |A(£)'x0U)xt  J 

T p+l<t<T  * 


00 


( 2 

u=0 


4 


9 


which  goes  to  0 as  T -»  oo  because  of  conditions  (ii)  and  (iii) 
of  Section  1 (see  also  Anderson  [3],  Lemma  2.6.1).  Thus  it  follows 
by  Tchebychev' s inequality  that  the  first  of  the  two  terms  in  (19) 
converges  to  0 in  probability  as  T -*■  <*>•  The  second  term  in  (19) 
is 


T t-p-n-1  oo 

(21)  - (admA(sd)  _1€)  ' (G~^Lne)  . - 2 2 2 8.A..€„  _ ..€, 


T t=p+n+l  u=0  v=0 


u v t-n-u  t-m-v 


Now  for  fixed  v = 0,1, 


. T t-p-n-1 

- 2 2 

T t=p+n+l  u=0 


6u€t-n-uet-m-v 


. T t-p-n-1 
< - 2 2 
T t=p+n-t-l  u=0 


6u' E' €t-n-uet-m-v' 


< 2 1 6 I , 

u=0 


H | Eh 
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so  that 


(22)  lim  E 

S -»  oo 


1 oo  T t-p-n-1 

- 2 Z Z 

T v=s+l  t=p4-n+l  u=0 


^v6uet-n-uet-ra-v 


< lim 


S ->  oo  v=  s+1 


(o  Z |6u|)  = 0, 


u=0 


uniformly  in  T.  Then  Markov's  inequality,  P( | X|  > 6)  < 


E|X| 


for  any  random  variable  X and  any  £ > 0,  implies  that  the  term 
in  (22)  converges  to  0 in  probability  as  s -*■  °°  uniformly  in  T. 

Also,  the  quantities 

T t-p-n-1  1 T-p-n-1  T 

t=p+n-t-l  u=0  6uet-n-u€t-m-v  T u?0  ^=p+n+u+l  6uet-n-uet-m- 

2 

have  probability  limits  as  T -►  oo  equal  to  o f v+m_n>  for 
v = max(0,n-m), . . . . Thus  it  follows  from  this  last  result  and  (22) 

(see  also  Anderson  [3]»  Theorem  7-7-1)  that  the  second  term  in  (19) > 

T t-p-n-1  oo 

i 2 2 6u^vet-n-u€t-m-v 

T t=p+n+l  u=0  v=0 


T t-p-n-1 

Z Z 

T v=0  t=p+n+l  u=0 


. 1 £ 


^u^ v€t-n-uet-m-v 


1 oo  T t-p-n-1 

+ - Z Z Z 

T v=s+l  t=p+n+l  n=0 


6u^vet-n-uet-m-v 


L- 


WO*?'- 


25. 


converges  in  probability  as  T •*  °°  to 


lim 

Z 

S ■*  00 

v=0 

s 

lim 

Z 

S -*■  00 

v=0 

, for  m < n 


v v+n-ra 


WnV  for  ">  > n- 


which  is  just  as  given  in  (19).  (The  argument  used  here  is 

similar  to  that  given  in  the  proof  of  Theorem  1 of  Hannan  and  Heyde 
[15],  page  2060).  The  same  type  of  argument  may  be  used  to 
establish  the  probability  limits  for  the  other  elements  of  the 
matrix 

- w'g'"1g“1w. 


1 T.r  I r,  I ' 


Proof  of  ( II)  : To  establish  the  asymptotic  normality  of  W G e. 

we  consider  the  asymptotic  behavior  of  a single  component.  For 
example,  again  using  ( 9 ),  (10),  and  (5),  we  have 

(25)  — ^logF  = - (^rn0(sd)Y) /G,-1e 

VT 


. [(  Z V.3£mAU)~10U)X.  ) /G,“1e  4-  U ‘ A(=d) -1€ ) ' e ] 

VT  1=1 


T t “P — 1 k, 

Z 2 [(  Z Z 6uPiA(^)"V(=£)x  ) e 

VT  t=p+l  u=0  i=l  u 1 * z m u’ 1 * 


+ A^)  et-met^ 


26. 


T t-p-1  k 

i Z [(  1 Z 5 S a(^)4{((^)x  ), 

/T  t=p+l  u=0  i=l  u 1 t m u,i 


+ Z - ^u€t-m-uetJ 

u=0 


+ — Z Z X.  €.  €. 

vr  t=P+i  u=n+i  u t-m_u  t 


ZTn  + RTnJ 


where 


(24) 


z_  = - z W. 

^ Vt  t=P+i 


(25)  W = Z 1 Z «uB1AU)"W)xt  1)€t  + Z 

z u=0  i=i  u 1 L ra  u,i  t U=Q 


and 


( t=p+l. 


(26) 


R, 


Tn 


1 T oo 

: - Z Z 

VT  t=p+l  u=n+l 


^uet-m-u€t" 


Now  for  all  T > p. 


(27)  Ef^)  - - Z 


T t, s=p+l  u, v=n+l 

h T 

= — Z Z V 

T t=p+l  u=n+l 


®( €t-m-u€tes-ra-v€ 


u 


< o Z X ■ M , 
u=n+l  u n 


^uet-m-u€t 

T), 


p 


27. 


and  lim  Mn  = 0 since  Z converges.  For  fixed  n,  W 


u 


n -*  00  ~~  u=0 

has  mean  0 and  variance  equal  to 

°tn  ' Ji  6uSlA(si)'lc,(y’)xt-m-u,i) 


tn 


2 2 4 ” ,2 

o + a Z X . 

u=0  u 


The  covariance  between  W.  and  W , t^s,  is  0 and 

sn 


2 _2 


n 


+ o 


u,  v=0 


XuXv€t-m-uet-m-v 


n 


+ 2<,2(tJo1  ji  6uBiA(^)'lc,(’:)xt-m-u,i)u!0ku£t-n,-u- 


Thus  letting  V™  = — Z E(W?  | ana  using  arguments 
1 T t=p+l  zn 

similar  to  those  given  in  the  proof  of  (I),  we  can  see  that  as 
2 

T -*•  00  VTn  converges  in  probability  to 
(28)  a*  - lim  E(4> 


-_Hm  E(V^) 


T t-p-1  k 


= lim  — Z { Z 
T -*  00  T t=p+l  u=0 


!*<■«> ‘^WVm-u.l) 


2 4 n 2 

+ 0 Z Xu 
u=0  u 


2 4^2 

0 u0  + a Z V 

u=0 


28. 

where  u-  appears  in  the  matrix  V defined  in  (16). 

T 

Also,  Zm  = — Z W.  is  a martingale  which  satisfies  a 

Tr-  tT  t=p+l  tn 

Linaeberg  condition 


(29) 


T 

2 


T E(V|.)  ‘-P+1 


E(wtn-I(lwtr.h 


VTn 


) )1 


0 as  T -* 


for  any  € > o,  where  l( • ) denotes  the  indicator  function. 
Condition  (29)  can  be  shown  to  hold  by  the  use  of  the  same  argument 
as  given  in  the  proof  of  Theorem  2,  page  206.5.  in  Hannan  and  Heyde 
[15].  (See  also  the  proof  of  Theorem  2.6.1  in  Anderson  [5]). 

Thus  through  equations  (28)  and  (29),  satisfies  conditions  (l) 

and  (2)  of  Brown  [6].  It  follows  by  Theorem  2 given  there  that  ZTn 
has  a limiting  normal  distribution  as  T -*>  oo,  with  mean  0 and 
variance  o^.  Finally,  using  Theorem  7*7.1  in  Anderson  [5]  and  the 
result  following  equation  (27),  we  can  conclude  that 


S."'2l2Sl,  ‘ (AWYl'B-'i 

VT  dam  V? 


•7  p 

“Tn  nTn 


has  a limiting  normal  distribution  as  T -*■  oo  with  mean  0 
and  variance  equal  to 


o^u  + lim 
n ■*  oo 


4 

o 


n 

Z 

u=0 


u 


2 4 “ 

o un  + a Z 

u=0 


\ 


2 


u 


2 

o u. 


The  asymptotic  normality  of  all  other  elements  of  - W'G'^e 

vT 

be  obtained  in  the  same  manner.  A similar  argument  can  also 
to  show  that  the  limiting  distribution  of  an  arbitrary  linear 
combination  of  the  elements  of  - w'g'  e. 


yT 

- C,W/G/_1€,  with  C = (c.,  , 
yT  L 


cr+k+p+q)  an 


can 

be  used 


arbitrary  constant  vector,  is  normal  with  mean  C and  variance 
2 

a C' VC,  where  V is  the  matrix  defined  by  (16).  Then  using  the 


29. 


continuity  theorem  for  characteristic  functions  , we  see  that 

2 

— W'G,-'l€  = — has  a limiting  multivariate  normal  distribution 

yT  VI  39 

N(0,ot*V)  as  T -+  oo,  and  thus  the  theorem  is  established. 
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